perf(native): scope incremental rebuild to truly-changed files (#1012) by carlos-alm · Pull Request #1027 · optave/ops-codegraph-tool

carlos-alm · 2026-04-29T21:33:46Z

Summary

Native 1-file incremental was 4.3× slower than WASM (876ms vs 203ms) because run_pipeline re-parsed the full reverse-dep cone (47 files for a 1-file change), cascading into insert/structure/roles/analysis phases.
Adopt the WASM save+reconnect strategy: save reverse-dep → changed-file edges before purge, reconnect to new node IDs after Stage 5. Reverse-dep files are no longer re-parsed — only their affected edges are reconstructed.
Result: 876ms → 43ms (95% faster, 0.78× WASM).

Per-phase improvements

Phase	Before	After
`insertMs`	185ms	0.2ms
`structureMs`	73ms	3.1ms
`rolesMs`	311ms	18.3ms

Acceptance criteria (#1012)

Phases scale with truly-changed files (better than reverse-dep cone target)
Native within ≤1.5× WASM (now 0.78×)
Benchmark gate enforces ceiling — benchmark-parity-gate.mjs already checks 1-file incremental ratio at limit 1.5

Verification

Edge counts identical between full and 1-file rebuild (37134) — no edge loss/duplication
181 Rust unit tests passed
21 reverse-dep/incremental tests passed (incremental-edge-duplication, incremental-edge-parity, native-db-purge-reverse-deps, scoped-rebuild)
18 broader integration tests passed (build-parity, incr-edge-gap, watcher-rebuild, cache, watcher-incremental)

Test plan

CI green (parity gate, benchmark)
Manual: run incremental-benchmark.ts to confirm timing

Native 1-file incremental was 4.3x slower than WASM (876ms vs 203ms) because `run_pipeline` re-parsed the full reverse-dep cone (47 files for a 1-file change), which cascaded into insert/structure/roles/analysis phases. Adopt the WASM save+reconnect strategy: save reverse-dep → changed-file edges before purge, reconnect to new node IDs after Stage 5. Reverse-dep files are no longer re-parsed — only their affected edges are reconstructed. Result: 876ms → 43ms (95% faster, 0.78x WASM). - insertMs: 185ms → 0.2ms - structureMs: 73ms → 3.1ms - rolesMs: 311ms → 18.3ms Edge counts identical between full and 1-file rebuild (37134), confirming no edge loss or duplication. The existing `1-file incremental ratio` gate in `benchmark-parity-gate.mjs` (limit 1.5) enforces the ceiling.

claude · 2026-04-29T21:34:02Z

Claude finished @carlos-alm's task —— View job

PR Review: Native Incremental Rebuild Performance Optimization

Reviewing PR #1027 for performance improvements to native incremental rebuilds.

Todo List

Read and analyze changed files
Root cause analysis - understand the problem being solved
Check for configuration/test integrity issues
Code quality and security review
Backlog compliance assessment
Final recommendation with rating

Root Cause Analysis

Problem Being Solved: Native 1-file incremental rebuilds were 4.3× slower than WASM (876ms vs 203ms) because run_pipeline was re-parsing the full reverse-dependency cone (47 files for a 1-file change) and cascading this through all pipeline phases (insert/structure/roles/analysis).

Why This Approach: The PR adopts WASM's proven save+reconnect strategy: save reverse-dep → changed-file edges before purge, reconnect to new node IDs after Stage 5. This eliminates the need to re-parse reverse-dep files — only their affected edges are reconstructed. The approach is well-tested in the WASM pipeline and provides identical edge counts.

Risk Assessment: LOW RISK - This is adopting a proven strategy from the WASM engine rather than inventing new logic. The PR claims identical edge counts (37134) between full and incremental rebuilds, indicating no data loss.

Configuration/Test Integrity Analysis

✅ PASSED - No configuration changes detected that lower standards:

No linting rules disabled or test timeouts increased
No test expectations weakened or coverage thresholds lowered
Only minor Cargo.lock version bump (routine dependency update)

Test Coverage: Strong existing regression protection for this exact scenario:

incremental-edge-duplication.test.ts - prevents edge leakage (bug: incremental rebuild leaks ~249 duplicate edges per run #979)
incremental-edge-parity.test.ts - ensures identical edge counts vs full builds
regression-guard.test.ts - performance regression gates with 25% threshold
PR claims all tests pass: 181 Rust unit tests, 21 incremental tests, 18 integration tests

Code Quality Assessment

Architecture: Clean implementation following established patterns

save_reverse_dep_edges() / reconnect_reverse_dep_edges() in change_detection.rs mirror existing WASM logic
Pipeline orchestrator in build_pipeline.rs lines 183-327 uses clear save-before-purge, reconnect-after-insert flow
Proper error handling and transaction management throughout

Security: No security concerns identified

No unsafe Rust code or external command execution
Proper SQL parameterization prevents injection
No credential handling or external API calls

Performance: Implementation optimized for the stated goal

Scoped parsing to truly-changed files only (build_pipeline.rs:221-225)
Efficient edge reconstruction without full re-parse
Batch operations for database writes

Backlog Compliance Assessment

Zero-dep: ✓ — No new runtime dependencies, builds on existing tree-sitter/SQLite/rusqlite
Foundation-aligned: ✓ — Directly addresses P1 ("always-current") by making incremental rebuilds faster
Problem-fit: 5/5 — Directly reduces rebuild latency for AI agents, preventing token waste from stale graphs
Breaking: No — Internal optimization with identical edge count output, no API changes
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Native incremental builds were 4.3× slower than WASM due to re-parsing full reverse-dep cone
**Why This Approach**: Adopts proven WASM save+reconnect strategy to avoid reverse-dep re-parsing
**Risk Assessment**: Low risk - mirrors existing tested approach, maintains edge count parity

## Backlog Compliance
- Zero-dep: ✓ — no new dependencies
- Foundation-aligned: ✓ — improves P1 (always-current) performance  
- Problem-fit: 5 — directly reduces AI agent rebuild latency
- Breaking: No — internal optimization, identical output
- Tier: 1

## Critical Concerns
None identified. Clean implementation with strong test coverage and performance validation.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Excellent performance improvement (95% faster) using proven approach with comprehensive test coverage. Addresses core foundation principle of staying always-current. Minor deduction for lack of new test coverage specific to save+reconnect logic, but existing edge parity tests provide adequate protection.

greptile-apps · 2026-04-29T21:37:15Z

Greptile Summary

This PR ports the WASM/JS incremental rebuild optimization to the native pipeline: instead of re-parsing every reverse-dep file in the changed-file cone, it saves the reverse-dep → changed-file edges before purge and reconnects them to new node IDs after Stage 5, cutting 1-file incremental time from 876 ms to 43 ms. It also fixes the role-classification gap for removal-only incremental builds (removal_reverse_deps seed) identified during the previous review round.

Confidence Score: 5/5

Safe to merge — no P0/P1 issues found; previous review threads are fully addressed.

The logic is well-reasoned and mirrors the proven WASM path. Edge save→purge→reconnect ordering is correct, deduplication via INSERT OR IGNORE is in place, the dropped-edge diagnostic is now logged, and the removal-only classification gap is patched. No new correctness or security issues identified.

No files require special attention.

Important Files Changed

Filename	Overview
crates/codegraph-core/src/change_detection.rs	Adds `SavedReverseDepEdge`, `save_reverse_dep_edges`, and `reconnect_reverse_dep_edges`; implementation correctly filters same-file self-edges, handles INSERT OR IGNORE no-ops in the reconnect counter, and wraps all reconnect inserts in a transaction.
crates/codegraph-core/src/build_pipeline.rs	Replaces re-parse of reverse-dep cone with save+reconnect; adds `removal_reverse_deps` seed for role classification on file-deletion builds; correctly orders save before purge and reconnect after Stage 5/7.

Sequence Diagram

sequenceDiagram
    participant BP as build_pipeline
    participant CD as change_detection
    participant DB as SQLite

    Note over BP,DB: Incremental rebuild (new strategy)
    BP->>CD: save_reverse_dep_edges(changed_paths)
    CD->>DB: SELECT edges rdep→changed (pre-purge)
    DB-->>CD: SavedReverseDepEdge[]
    CD-->>BP: saved_edges

    BP->>CD: find_reverse_dependencies(removed_set) [if removals]
    CD->>DB: SELECT files that import removed files
    DB-->>CD: removal_reverse_deps

    BP->>CD: purge_changed_files(files_to_purge, &[])
    CD->>DB: DELETE nodes/edges for changed+removed files

    Note over BP,DB: Stage 4-5: parse & insert only changed files
    BP->>DB: parse_files_parallel(changed only)
    BP->>DB: do_insert_nodes (new node IDs)

    Note over BP,DB: Stage 7: rebuild edges for changed files
    BP->>DB: build_import_edges / build_call_edges

    BP->>CD: reconnect_reverse_dep_edges(saved_edges)
    CD->>DB: SELECT new node id by (name,kind,file,~line)
    CD->>DB: INSERT OR IGNORE edge (old_source_id, new_target_id)
    CD-->>BP: (reconnected, dropped)

    Note over BP,DB: Stage 8: classify roles
    BP->>DB: do_classify_incremental(changed_files + removal_reverse_deps)

_{Reviews (3): Last reviewed commit: "fix(native): reclassify reverse-deps of ..." | Re-trigger Greptile}

greptile-apps · 2026-04-29T21:37:19Z

+    // Reconnect saved reverse-dep edges to new node IDs (#1012). Mirrors
+    // `reconnectReverseDepEdges` in build-edges.ts — for each saved edge,
+    // look up the new target node and recreate the edge with the original


Dropped edges silently discarded

reconnect_reverse_dep_edges returns (reconnected, dropped) but the return value is ignored here. Dropped edges occur when a target node can't be found after re-insert (e.g. a renamed or deleted symbol), and they're silently swallowed with no log output. In the normal "symbol still exists" path this is fine, but in practice this makes it very hard to diagnose partial edge loss during an incremental rebuild.

Suggested change

// Reconnect saved reverse-dep edges to new node IDs (#1012). Mirrors

// `reconnectReverseDepEdges` in build-edges.ts — for each saved edge,

// look up the new target node and recreate the edge with the original

if !saved_reverse_dep_edges.is_empty() {

let (reconnected, dropped) = change_detection::reconnect_reverse_dep_edges(conn, &saved_reverse_dep_edges);

if dropped > 0 {

eprintln!("[codegraph] reconnect_reverse_dep_edges: {reconnected} reconnected, {dropped} dropped (target nodes not found)");

}

}

Fixed in c268648 — reconnect_reverse_dep_edges return value is now captured and an eprintln! warning is emitted when dropped > 0 so partial edge loss is diagnosable. The normal happy path (no drops) stays quiet.

greptile-apps · 2026-04-29T21:37:20Z

+                Ok(new_id) => {
+                    if insert_stmt
+                        .execute(rusqlite::params![
+                            s.source_id,
+                            new_id,
+                            &s.edge_kind,
+                            s.confidence,
+                            s.dynamic,
+                        ])
+                        .is_ok()
+                    {
+                        reconnected += 1;
+                    }


reconnected counter inflated by INSERT OR IGNORE no-ops

insert_stmt.execute(…).is_ok() returns true for both a genuine insert (rows_affected() == 1) and a silently ignored duplicate (rows_affected() == 0, because INSERT OR IGNORE suppresses the constraint error). The counter will overcount when two saved entries resolve to the same (source_id, new_target_id, kind). Since the return value is only used for diagnostics this is low-impact, but it makes the (reconnected, dropped) log misleading.

Suggested change

Ok(new_id) => {

if insert_stmt

.execute(rusqlite::params![

s.source_id,

new_id,

&s.edge_kind,

s.confidence,

s.dynamic,

])

.is_ok()

{

reconnected += 1;

}

Ok(new_id) => {

match insert_stmt.execute(rusqlite::params![

s.source_id,

new_id,

&s.edge_kind,

s.confidence,

s.dynamic,

]) {

Ok(n) if n > 0 => reconnected += 1,

Ok(_) => {} // INSERT OR IGNORE: duplicate skipped

Err(_) => dropped += 1,

}

}

Fixed in 2132d99 — replaced is_ok() with a match on rows_affected. Only Ok(n) if n > 0 increments reconnected; Ok(0) (the INSERT OR IGNORE no-op for duplicates) is now silently skipped instead of inflating the diagnostic counter. Genuine errors still increment dropped.

github-actions · 2026-04-29T21:41:13Z

Codegraph Impact Analysis

3 functions changed → 0 callers affected across 0 files

run_pipeline in crates/codegraph-core/src/build_pipeline.rs:102 (0 transitive callers)
save_reverse_dep_edges in crates/codegraph-core/src/change_detection.rs:371 (0 transitive callers)
reconnect_reverse_dep_edges in crates/codegraph-core/src/change_detection.rs:439 (0 transitive callers)

INSERT OR IGNORE returns Ok(0) for duplicate-row no-ops, so the previous `is_ok()` check inflated `reconnected` whenever two saved entries resolved to the same (source_id, new_target_id, kind). Match on the rows-affected count and only increment when n > 0; leave the dropped counter for genuine errors. Diagnostic-only — no functional change to edge data. Impact: 1 functions changed, 0 affected

`reconnect_reverse_dep_edges` returns `(reconnected, dropped)` but the caller previously discarded the result. Dropped edges occur when a saved target node can't be re-located after re-insert (e.g. the symbol was renamed or deleted), and silently swallowing them makes partial edge loss during incremental rebuild hard to diagnose. Surface the count via eprintln when dropped > 0 — same observability pattern other native diagnostics use. Impact: 1 functions changed, 0 affected

carlos-alm · 2026-04-30T02:05:46Z

@greptileai

greptile-apps · 2026-04-30T02:12:19Z

+        let changed_paths: Vec<String> =
+            parse_changes.iter().map(|c| c.rel_path.clone()).collect();
+
+        if !opts.no_reverse_deps.unwrap_or(false) {
+            saved_reverse_dep_edges =
+                change_detection::save_reverse_dep_edges(conn, &changed_paths);
+        }


Removed files' reverse-dep roles not reclassified

changed_paths is built from parse_changes only — change_result.removed is excluded. save_reverse_dep_edges therefore never saves edges from files that import a removed file (call it B→A where A is deleted). Those edges are correctly purged when A's nodes are deleted. But because B never enters file_symbols, it also never enters changed_files, so do_classify_incremental never reclassifies it. B's fan-out silently decreases in the DB but its role record is stale.

In the old code changed_rel_paths included removed files, find_reverse_dependencies found B, B was re-parsed and landed in changed_files for reclassification. The new strategy intentionally skips re-parsing, but the classification gap for the removal case was not compensated. For a removal-only incremental build changed_files is empty, so neither do_classify_incremental nor do_classify_full is invoked, leaving every file that imported the deleted file with a permanently stale role until a future build touches them again.

A minimal fix: identify reverse-dep files of removed entries before purge (the existing find_reverse_dependencies with change_result.removed as seed would work), then union those file paths into the seed passed to do_classify_incremental. No re-parse is needed — reclassification reads fan-in/fan-out from the current (post-purge) edges table.

Fixed in b6baec0 — applied your suggested approach: before purge, compute reverse-deps of change_result.removed via find_reverse_dependencies and union them into the seed passed to do_classify_incremental. No re-parse needed; the classifier reads fan-in/fan-out from the post-purge edges table.

This also covers the removal-only build case (where changed_files would otherwise be empty and classification skipped entirely). Verified the change doesn't regress: 181 Rust unit tests + 23 incremental integration tests + 15 roles tests all pass locally. The WASM side at detect-changes.ts:467 already includes ctx.removed in its findReverseDependencies seed, so this restores parity.

When a file is removed during incremental build, its nodes are purged along with edges pointing at them. Files that imported the removed file were never re-parsed (correct, by design) but their role records became stale: fan-out dropped silently because do_classify_incremental was only seeded with parsed files. For removal-only builds, classification was skipped entirely. Compute reverse-deps of removed entries before purge (find_reverse_deps with change_result.removed as seed, mirroring WASM/JS) and union them into the seed passed to do_classify_incremental. No re-parse needed — the classifier reads fan-in/fan-out from the post-purge edges table. Impact: 1 functions changed, 0 affected

carlos-alm · 2026-04-30T02:22:13Z

@greptileai

greptile-apps Bot reviewed Apr 29, 2026

View reviewed changes

carlos-alm added 3 commits April 29, 2026 20:01

Merge branch 'main' into perf/native-incremental-scope-1012

2da1cba

greptile-apps Bot reviewed Apr 30, 2026

View reviewed changes

carlos-alm merged commit a724a09 into main Apr 30, 2026
31 checks passed

carlos-alm deleted the perf/native-incremental-scope-1012 branch April 30, 2026 03:50

github-actions Bot locked and limited conversation to collaborators Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(native): scope incremental rebuild to truly-changed files (#1012)#1027

perf(native): scope incremental rebuild to truly-changed files (#1012)#1027
carlos-alm merged 5 commits intomainfrom
perf/native-incremental-scope-1012

carlos-alm commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Apr 29, 2026

Uh oh!

carlos-alm Apr 30, 2026

Uh oh!

greptile-apps Bot Apr 29, 2026

Uh oh!

carlos-alm Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

carlos-alm commented Apr 30, 2026

Uh oh!

greptile-apps Bot Apr 30, 2026

Uh oh!

carlos-alm Apr 30, 2026

Uh oh!

carlos-alm commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    // Reconnect saved reverse-dep edges to new node IDs (#1012). Mirrors
-    // `reconnectReverseDepEdges` in build-edges.ts — for each saved edge,
-    // look up the new target node and recreate the edge with the original
+    if !saved_reverse_dep_edges.is_empty() {
+        let (reconnected, dropped) = change_detection::reconnect_reverse_dep_edges(conn, &saved_reverse_dep_edges);
+        if dropped > 0 {
+            eprintln!("[codegraph] reconnect_reverse_dep_edges: {reconnected} reconnected, {dropped} dropped (target nodes not found)");
+        }
+    }

Conversation

carlos-alm commented Apr 29, 2026

Summary

Per-phase improvements

Acceptance criteria (#1012)

Verification

Test plan

Uh oh!

claude Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Native Incremental Rebuild Performance Optimization

Todo List

Root Cause Analysis

Configuration/Test Integrity Analysis

Code Quality Assessment

Backlog Compliance Assessment

Final Assessment

Uh oh!

greptile-apps Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented Apr 30, 2026

Uh oh!

greptile-apps Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 29, 2026 •

edited

Loading

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading